Skip to main content

Important Machine Learning Questions

20 multiple-choice questions (MCQs):

1. What is the primary goal of supervised learning?

a) Discovering hidden patterns in data
b) Minimizing the number of features
c) Learning a mapping from inputs to outputs using labeled data
d) Maximizing the number of clusters

Answer: c) Learning a mapping from inputs to outputs using labeled data


2. Which of the following is an example of unsupervised learning?

a) Linear regression
b) Decision trees
c) K-means clustering
d) Support vector machines

Answer: c) K-means clustering


3. In machine learning, what does overfitting refer to?

a) A model that performs well on training data but poorly on test data
b) A model that generalizes well to new data
c) A model that ignores noise in the data
d) A model with low variance

Answer: a) A model that performs well on training data but poorly on test data


4. Which of the following is NOT a type of machine learning?

a) Supervised learning
b) Unsupervised learning
c) Reinforcement learning
d) Embedded learning

Answer: d) Embedded learning


5. What is the purpose of a loss function in machine learning?

a) To improve computational speed
b) To measure how well the model is performing
c) To increase the dataset size
d) To decrease the number of layers in a neural network

Answer: b) To measure how well the model is performing


6. Which algorithm is most commonly used for classification tasks?

a) K-means
b) Random forest
c) Principal Component Analysis (PCA)
d) Apriori Algorithm

Answer: b) Random forest


7. Which metric is commonly used to evaluate classification models?

a) Mean Absolute Error
b) R-squared
c) Accuracy
d) Root Mean Squared Error

Answer: c) Accuracy


8. What is the main purpose of Principal Component Analysis (PCA)?

a) To classify data
b) To reduce dimensionality
c) To cluster data
d) To detect anomalies

Answer: b) To reduce dimensionality


9. In reinforcement learning, what is an "agent"?

a) A function that maps inputs to outputs
b) A program that learns by interacting with an environment
c) A set of training data points
d) A loss function used for optimization

Answer: b) A program that learns by interacting with an environment


10. What does "Naïve" in Naïve Bayes classifier mean?

a) It assumes features are independent
b) It is not useful for classification
c) It is slow in training
d) It cannot handle large datasets

Answer: a) It assumes features are independent


11. Which technique is commonly used to prevent overfitting in deep learning models?

a) Increasing the number of layers
b) Using dropout
c) Decreasing the learning rate
d) Removing training data

Answer: b) Using dropout


12. What is the purpose of a confusion matrix?

a) To determine whether data is balanced
b) To evaluate classification model performance
c) To visualize data distribution
d) To optimize hyperparameters

Answer: b) To evaluate classification model performance


13. Which algorithm is used for anomaly detection?

a) Decision trees
b) K-Nearest Neighbors
c) Isolation Forest
d) Linear regression

Answer: c) Isolation Forest


14. Which of the following is a supervised learning algorithm?

a) K-means
b) DBSCAN
c) Random forest
d) Apriori

Answer: c) Random forest


15. Which optimization algorithm is commonly used for training deep learning models?

a) Newton's method
b) Genetic algorithm
c) Stochastic Gradient Descent (SGD)
d) Apriori algorithm

Answer: c) Stochastic Gradient Descent (SGD)


16. What is the curse of dimensionality?

a) The problem of having too few data points
b) The problem of increased computational complexity in high-dimensional spaces
c) The difficulty of choosing an appropriate activation function
d) The overfitting problem in small datasets

Answer: b) The problem of increased computational complexity in high-dimensional spaces


17. What is an epoch in deep learning?

a) A single pass of the entire dataset through the model during training
b) The process of splitting data into training and testing sets
c) The removal of redundant features
d) The evaluation of the model on test data

Answer: a) A single pass of the entire dataset through the model during training


short questions.


Below is a list of 10 open-ended machine learning questions. Each question is designed to be worth 5 marks, allowing the user to provide detailed answers.


  1. Explain the difference between supervised and unsupervised learning.

  2. What is overfitting in machine learning, and what strategies can be employed to prevent it?

  3. Discuss the importance of feature engineering in building effective machine learning models.

  4. Describe the process of cross-validation and explain why it is significant in model evaluation.

  5. Explain the role of a loss function in training machine learning models.

  6. Compare and contrast decision trees with random forests in terms of their methodology and performance.

  7. What are neural networks, and how do they contribute to the field of deep learning?

  8. Describe the concept of reinforcement learning and provide an example of its application.

  9. How does dimensionality reduction improve machine learning models, and what techniques are commonly used for this purpose?

  10. Discuss the impact of hyperparameter tuning on the performance of machine learning models.

  11. Discuss how regularization techniques such as L1 and L2 help control overfitting.

  12. Describe how gradient descent optimizes model parameters during training.

  13. Explain the importance of feature scaling in machine learning.

  14. Discuss how cross-entropy loss functions are used in classification tasks.

  15. Examine the role of data preprocessing in the machine learning pipeline.

  16. Describe the concept of transfer learning and its benefits in deep learning.

  17. Explain the process of hyperparameter optimization using techniques such as grid search and random search.

  18. Discuss the challenges and considerations involved in deploying machine learning models in production.


long-form exam questions


Question 1:
a) Define supervised learning and unsupervised learning.
b) Compare and contrast their key characteristics, including typical applications and limitations.


Question 2:
a) Define overfitting and underfitting in the context of machine learning.
b) Explain the causes and consequences of each, and outline at least three strategies to prevent overfitting.


Question 3:
a) Define feature engineering and explain its importance in building machine learning models.
b) Describe two methods of feature selection or extraction (e.g., filter methods, wrapper methods, or PCA) and explain how they contribute to model performance.


Question 4:
a) Define hyperparameters in machine learning and discuss their role in model training.
b) Compare grid search and random search methods for hyperparameter optimization, detailing the advantages and disadvantages of each approach.


Question 5:
a) Define ensemble learning and describe its underlying principles.
b) Compare two ensemble methods (such as bagging and boosting), explaining how they combine multiple models and under what circumstances ensemble techniques are particularly effective.